Skip to content

Conversation

@phoebewang
Copy link
Contributor

@llvmbot
Copy link
Member

llvmbot commented Mar 9, 2025

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

https://godbolt.org/z/rWYdqnjjx


Full diff: https://github.com/llvm/llvm-project/pull/130488.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+17-1)
  • (modified) llvm/test/CodeGen/X86/apx/cf.ll (+23-5)
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 5fe7203c052d8..d83adf99ce6d3 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -5366,6 +5366,7 @@ bool X86InstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
   MachineBasicBlock::reverse_iterator From =
       std::next(MachineBasicBlock::reverse_iterator(CmpInstr));
   for (MachineBasicBlock *MBB = &CmpMBB;;) {
+    SmallVector<MachineInstr *, 4> NDDInsts;
     for (MachineInstr &Inst : make_range(From, MBB->rend())) {
       // Try to use EFLAGS from the instruction defining %SrcReg. Example:
       //     %eax = addl ...
@@ -5441,13 +5442,28 @@ bool X86InstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
           continue;
         }
 
+        // Try to replace NDD with NF instructions.
+        if (Subtarget.hasNF() &&
+            X86II::hasNewDataDest(Inst.getDesc().TSFlags) &&
+            Inst.registerDefIsDead(X86::EFLAGS, TRI)) {
+          NDDInsts.push_back(&Inst);
+          continue;
+        }
+
+        NDDInsts.clear();
+
         // Cannot do anything for any other EFLAG changes.
         return false;
       }
     }
 
-    if (MI || Sub)
+    if (MI || Sub) {
+      for (MachineInstr *NDD : NDDInsts) {
+        NDD->setDesc(get(X86::getNFVariant(NDD->getOpcode())));
+        NDD->removeOperand(NDD->getNumOperands() - 1);
+      }
       break;
+    }
 
     // Reached begin of basic block. Continue in predecessor if there is
     // exactly one.
diff --git a/llvm/test/CodeGen/X86/apx/cf.ll b/llvm/test/CodeGen/X86/apx/cf.ll
index a64d7df11a4d0..fc170ca5f2b2e 100644
--- a/llvm/test/CodeGen/X86/apx/cf.ll
+++ b/llvm/test/CodeGen/X86/apx/cf.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64 -mattr=+cf,+avx512f -verify-machineinstrs | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64 -mattr=+cf,+nf,+ndd,+avx512f -verify-machineinstrs | FileCheck %s
 
 define void @basic(i32 %a, ptr %b, ptr %p, ptr %q) {
 ; CHECK-LABEL: basic:
@@ -57,9 +57,8 @@ entry:
 define i64 @reduced_data_dependency(i64 %a, i64 %b, ptr %c) {
 ; CHECK-LABEL: reduced_data_dependency:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movq %rdi, %rcx
-; CHECK-NEXT:    subq %rsi, %rcx
-; CHECK-NEXT:    cfcmovnsq (%rdx), %rdi, %rax
+; CHECK-NEXT:    subq %rsi, %rdi, %rax
+; CHECK-NEXT:    cfcmovnsq (%rdx), %rdi, %rcx
 ; CHECK-NEXT:    addq %rcx, %rax
 ; CHECK-NEXT:    retq
 entry:
@@ -125,7 +124,7 @@ entry:
   ret void
 }
 
-define void @single_cmp(i32 %a, i32 %b, ptr %c, ptr %d) #2 {
+define void @single_cmp(i32 %a, i32 %b, ptr %c, ptr %d) {
 ; CHECK-LABEL: single_cmp:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    cmpl %esi, %edi
@@ -139,3 +138,22 @@ entry:
   tail call void @llvm.masked.store.v1i16.p0(<1 x i16> %2, ptr %d, i32 2, <1 x i1> %1)
   ret void
 }
+
+define void @load_add_store(i32 %a, i32 %b, ptr %p) {
+; CHECK-LABEL: load_add_store:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    cmpl %esi, %edi
+; CHECK-NEXT:    cfcmovnew (%rdx), %ax
+; CHECK-NEXT:    {nf} incw %ax
+; CHECK-NEXT:    cfcmovnew %ax, (%rdx)
+; CHECK-NEXT:    retq
+entry:
+  %0 = icmp ne i32 %a, %b
+  %1 = insertelement <1 x i1> poison, i1 %0, i64 0
+  %2 = tail call <1 x i16> @llvm.masked.load.v1i16.p0(ptr %p, i32 2, <1 x i1> %1, <1 x i16> poison)
+  %3 = extractelement <1 x i16> %2, i64 0
+  %4 = add i16 %3, 1
+  %5 = insertelement <1 x i16> poison, i16 %4, i64 0
+  tail call void @llvm.masked.store.v1i16.p0(<1 x i16> %5, ptr %p, i32 2, <1 x i1> %1)
+  ret void
+}

bool ClearsOverflowFlag = false;
bool ShouldUpdateCC = false;
bool IsSwapped = false;
bool HasCF = Subtarget.hasNF();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasCF->HasNF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! done.

@KanRobert
Copy link
Contributor

KanRobert commented Mar 10, 2025

It seems the optimization does not rely on NDD?

@phoebewang
Copy link
Contributor Author

It seems the optimization does not rely on NDD?

You are correct! I thought only NDD instructions support NF. Thanks!


// Try to replace non-NF with NF instructions.
if (HasNF && Inst.registerDefIsDead(X86::EFLAGS, TRI)) {
unsigned NewOp = X86::getNFVariant(Inst.getOpcode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we don't need to store the opcodes of NF variants. Just setDesc(X86::getNFVariant(Inst.getOpcode())); at line 5654?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the table lookup is more expensive than a little memory space.

Copy link
Contributor

@KanRobert KanRobert Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it introduce more times of table lookup? The times seem equal for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to query twice if we don't store the value. We need query here to check if the instruction can turn into NF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay

@phoebewang phoebewang changed the title [X86][APX] Try to replace NDD with NF instructions when optimizeCompareInstr [X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr Mar 10, 2025
Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this test should be added in nf.ll instead of cf.ll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no nf.ll. The nf tests are sacttered in add/or/sub/...ll, so put it in cf.ll is ok since we happen to have nf condition lowering here :)

@phoebewang phoebewang merged commit 507e0c3 into llvm:main Mar 10, 2025
11 checks passed
@phoebewang phoebewang deleted the APX branch March 10, 2025 13:08
phoebewang added a commit to phoebewang/llvm-project that referenced this pull request Mar 12, 2025
Since llvm#130488, we have NF instructions when converting to three address instructions.
phoebewang added a commit that referenced this pull request Mar 13, 2025
…30969)

Since #130488, we have NF instructions when converting to three address
instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants